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Abstract 

We study the consistency of sample mean-variance portfolios of arbitrarily high dimension that are 
based on Bayesian or shrinkage estimation of the input parameters as well as weighted sampling. In 
an asymptotic setting where the number of assets remains comparable in magnitude to the sample 
size, we provide a characterization of the estimation risk by providing deterministic equivalents of 
the portfolio out-of-sample performance in terms of the underlying investment scenario. The previous 
estimates represent a means of quantifying the amount of risk underestimation and return overestimation 
of improved portfolio constructions beyond standard ones. Well-known for the latter, if not corrected, these 
deviations lead to inaccurate and overly optimistic Sharpe-based investment decisions. Our results are 
based on recent contributions in the field of random matrix theory. Along with the asymptotic analysis, 
the analytical framework allows us to find bias corrections improving on the achieved out-of-sample 
performance of typical portfolio constructions. Some numerical simulations validate our theoretical 
findings. 
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I. Introduction 

A. Background and research motivations 

The foundations of modem portfolio theory were laid by Markowitz's ground-breaking article HI, 
where the idea of diversifying a portfolio by spreading bets across a universe of risky financial assets 
was refined and generalized by the more sophisticated one of combining the assets so as to optimize 
the risk-return tradeoff. In practice, Markowitz's mean- variance optimization framework for solving the 
canonical wealth allocation problem relies on the statistical estimation of the unknown expected values 
and covariance matrix of the asset returns from sample market observations. 

In general, the uncertainty inherently associated with imperfect moments estimates represents a major 
drawback in the application of the classical Markowitz framework. Indeed, the optimal mean-variance 
solution has been empirically observed to be significantly sensitive to deviations from the true input 
parameters. In addition, and aside from computational complexity issues, the estimation of the parameters 
is involved, mainly due to the instability of the parameter estimates through time. Generally, estimates 
of the covariance matrix are more stable than those of the mean returns, and so many studies disregard 
the estimation of the latter and concentrate on improving the sample performance of the so-called global 
minimum variance portfolio (GMVP); see arguments in ||2l. 

In the financial literature, the previous source of portfolio performance degradation is refened to as 
estimation risk. Especially when the number of securities is comparable to the number of observations, 
estimation errors may in fact prevent the mean-variance optimization framework from being of any 
practical use. In fact, for severe levels of estimation risk, the naive portfolio allocation rule namely 
obtained by equally weighting the assets without incorporating any knowledge about their mean and 
covariance turns out to represent a firm candidate choice 0. The consistency and distributional properties 
of sample optimal mean-variance portfolios and their Sharpe ratio performance has been analyzed and 
characterized for finite samples and asymptotically (see, most recently, H, Q, [61, and also the list of 
references therein for earlier contributions). 

Commencing with particularly high activity and contribution levels in the 80's, there exists a vast 
literature on portfolio selection methods accounting for estimation risk by explicitly dealing with the 
lack of robustness and stability of the sample optimal mean-variance solution, which we do not intend 
to exhaustively review here; we refer the reader to Q, HI for a thorough treatment of the subject. 
Some remarks on the two main lines of approach are in order. One class of methods based on convex 
analysis and nonlinear optimization techniques focuses on formulations of the allocation problem where 
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robustness to estimation errors is achieved by means of the expUcit modeling of parameter uncertainty 
regions. Conceptually, assuming worst-case bounds on the input parameters may not be effective in 
practice since no information is available about the distribution of the estimated parameters within the 
uncertainty boundaries. 

On the other hand, instances of a second family of methods of statistical or probabilistic nature 
are approaches based on Bayesian and Steinian shrinkage estimation seeking efficiency by weighting 
a sensible prior belief and the classical sample estimator in inverse proportion to their dispersion (see, 
e.g., 0, lITOll ). As a matter of fact, this class of techniques provides a rather general framework for 
understanding different forms of portfolio corrections and performance improvements tackling estimation 
risk. Indeed, explicit links have been found between the latter and the robust optimization solutions 
introduced above, which turn out to be possibly interpreted from a shrinkage estimation perspective fTTIl. 
Furthermore, constraining^ the portfolio weights has been additionally noted to be equivalent to adding 
some structure to the covariance estimation problem as obtained through Bayesian or shrinkage-based 
procedures |2|. Arguably, the latter constitutes an effective way helping to avoid overfitting the sample 
data and to improve the stability of the realized portfolio solution out of sample and over time (see also 
|[T2ll . where the authors investigate the effects of norm-constrains in the solution of the weight vector). 
The application of Bayesian and shrinkage approaches is not limited to the moment estimation problem 
alone, but can indeed be extended to incorporate any prior belief directly on the portfolio weights. 
Linear shrinkage solutions optimally combining different portfolio allocation rules, such as the GMVP, 
the portfolio with equal weights and the tangency portfoho (cf. Section have been reported in |iT3l . 

m, nsi, m. 

Alternative approaches have been based on resampling techniques ifTTl . ifTSl . as well as stochastic 
programming and also robust estimation, where the emphasis is on robustifying estimators that are efficient 
under the assumption of Gaussian asset-returns, and which are usually highly sensitive to deviations from 
the distributional assumption (see, e.g., |19l and references therein). Finally, a line of contributions from 
statistical physics initiated by jlOl . iflTl have been reporting on a methodology based on random matrix 
theory that consists of preserving the stability over time of the covariance matrix estimator by filtering 
noisy eigenvalues conveying no valuable information. The cleaning mechanism relies on the empirical 
fact that relevant information is structurally captured by some few eigenvalues, while the rest can be 

'Usual constraints on the allocation weights that are typically considered in the portfolio construction process are those 
modelling the self-financing characteristic of the investment rule, as well as budget and short-selling restrictions. 
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ascribed to noise and measurement errors and resemble the spectrum of a white covariance matrix (see 
also f22\ ). 

In this paper, we are interested in the class of structured portfolio estimators based on the combination 
of Bayesian or James-Stein shrinkage and sample weighting. Motivated by the widespread application 
of this class of statistical methods in the practice of portfolio and risk management, our focus is on the 
performance of portfolio constructions as a function of the set of weights as well as the shrinkage targets 
and intensity coefficients parameterizing the improved moment forecasts. The extension of the statistical 
performance analysis of sample optimal portfolios with standard moment estimates to the case of improved 
shrinkage estimators is not straightforward. We concentrate on the consistency analysis by considering 
a limiting regime that is defined by both the number of samples and the portfolio dimension going to 
infinity at the same rate. Such an asymptotic setting will prove to be more convenient to characterize 
realistic, finite-dimensional practical conditions, where sample-size and number of assets are comparable 
in magnitude. In particular, we resort to some recent results from the theory of the spectral analysis 
of large random matrices, which as in |18| and contrary to the random matrix theoretical contributions 
from statistical physics cited above, are based on Stieltjes transform methods and stochastic convergence 
theory. 

Before outlining the contributions and structure of the work, we draw some connections between the 
subject of the paper and classical methods in the statistical signal processing literature. As a matter of 
fact, ([T]) encompasses a broad range of system configurations described by the general vector channel 
model. In fact, as for the mean-variance portfolio optimization problem, usual linear filtering schemes 
solving typical signal waveform estimation and detection problems in sensor array processing and wireless 
communications are based on the estimation of the unknown observation covariance matrix as well as 
possibly a vector of cross-correlations with a pilot training sequence. Prominent examples are the Capon 
or minimum variance spatial filter as well as the minimum mean-square error beamformer and detector 
ll23l . |[24l . and also adaptive filteringj applications ll26l . in all of which both Bayesian and regularization 
(shrinkage) methods are widely applied. Indeed, robust methods are similarly well-known and extensively 
used in signal processing applications (see examples in, e.g., |[27l . |[28l ). In particular, norm-constrains 
have been extensively investigated in the sensor array signal processing literature (see, e.g., [29]). Finally, 
analyses of weighted sample estimators of covariance matrices can be found in |[30]| . |[3T|| and applications 



^In particular, typical formulations of this problem based on (weighted) least-squares regression are intimately related to the 
passive investment strategy of index tracking (see, e.g., 1251 Chapter 4]). 
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of the bootstrap in |[32l . 

B. Contributions and structure of the work 

The main contributions of the paper are as follows. We first characterize the consistency of sample 
mean-variance portfolios based on the aforementioned improved moment estimators by providing asymp- 
totic deterministic equivalents of the achieved out-of-sample performance in the more meaningful double- 
limit regime introduced above. Our analytical framework allows us to quantify and better understand the 
impact of estimation errors on the out-of-the-sample performance of optimal portfolios. Specifically, we 
provide a precise quantitative description of the amount of risk underestimation and return overestimation 
of portfolio constructions based on improved estimators, in a way depending on the ratio of the portfolio 
dimension to sample-size as well as the underlying investment scenario. This phenomena, which render 
overly optimistic any investment assessment and decision based on estimated Sharpe ratios, has already 
been observed in the financial literature for standard portfolio implementations. 

Furthermore, we propose a class of mean-variance portfolio estimators defined in terms of a set of 
weights and shrinkage parameters calibrated so as to optimize the achieved out-of-sample performance. 
In essence, an optimal parameterization is obtained by effectively correcting the analytically derived 
asymptotic deviations of the performance of sample portfolios. 

The structure of the work is as follows. After the brief literature account and introductory research 
motivations in this section, Section|lI]introduces the modeling details and the moment forecasting schemes 
considered in this paper. The problem of evaluating the out-of-sample performance of large portfolios is 
also explained. In Section ITlI-Ai we provide a characterization of the performance of improved estimators 
based on sample weighting and James-Stein shrinkage. Observed deviations from optimal performance 
are corrected in Section IIII-B[ where we propose a class of improved portfolios for high-dimensional 
settings. Section |IV] presents some simulation work validating our theoretical findings and Section |V] 
concludes the contribution by summarizing the paper. Technical results and proofs are relegated to the 
appendices. 

II. Data model and problem formulation 

Consider the time series with the logarithmic differences of the prices of M financial assets at the 
edges of an investment period with time-horizon t. Generally enough, we can define the data generating 
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process of the previous compound or log returns by the following vector stochastic proces 

1/2 

yt = + et, = S;/ xt, (1) 

where /x^ and are the expected value and covariance matrix of the asset returns over the investment 
period, and is a random vector with independent and identically distributed (i.i.d.) entries having mean 
zero and variance one. We are interested in the problem of optimal single -period (static) mean-variance 
portfolio selection, which can be mathematically formulated as the following quadratic optimization 
problem with linear constraints: 

min w^S^Wf 

s.t. wf = fid 

1a/ = 1, 

where fid represents the target or desired level of expected portfolio returrQ, and wJIm = 1 is a budget 
constraint. 

We shall assume without loss of generality that the forecasting sampling frequency coincides with the 
rebalancing frequency. In particular, mean vector and covariance matrix are forecasted with the return 
data over a prescribed estimation window up to the time of the investment decision. Since we only 
consider the case of a single -period investment horizon, in the sequel we will omit the subscript and let 
= w for notational convenience. The solution to Q is straightforwardly given by 

C - HdB I fidA-B 

^Notation: All vectors are defined as column vectors and designated with bold lower case; all matrices are given in bold 
upper case; for both vectors and matrices a subscript will be added to emphasize dependence on dimension, though it will 
be occasionally dropped for the sake of clarity of presentation; [■]. will be used for the j\h entry of a vector; {■)^ denotes 
transpose; Im denotes the M x M identity matrix; 1m denotes an M dimensional vector with all entries equal to one; tr [■] 
denotes the matrix trace operator; R and C denote the real and complex fields of dimension specified by a superscript; Im {z} 
denotes imaginary part of the complex argument; = {z £ C : Imjjz} > 0}; = {z £<C: Im{2;} > 0}; E [■] denotes 
expectation; given two quantites a,b, a >i b will denote both quantities are asymptotic equivalents, i.e., \a — b\ 0, with a.s. 
denoting almost sure convergence; K, Kp denote constant values not depending on any relevant quantity, apart from the latter 
on a parameter p; |-| denotes absolute value and ||-|| denotes the Euclidean norm for vectors and the induced norm for matrices 
(i.e., spectral or strong norm), whereas ||-||^ denotes Frobenius norm, i.e., for a matrix A G i^mxm ^j^j^ eigenvalues denoted by 
Am (A), m = 1, . . . ,M, such that \m (A) < Aa/-i (A) < . . . < Ai (A), and spectral radius p(A) = maxi<m<M (|Am|), 
||A|| = {p{A^A)f'\ = (Tr [A" ||A||,^ = Tr [(A^A)^/^]. 

''As conventionally, and for the sake of clarity of presentation, we will assume that logarithmic returns are well approximated 
by their linear counterparts, so that we can claim about the additivity of returns over both portfolio assets and intertemporally. 
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where A = B = l^^S^^/i^ and C = /x^^S^^/^^. In particular, if the constraint on the level 

of return achieved is dropped, then we obtain the so-called global minimum variance portfolio (GMVP), 
which is given by 



WGMVP = ^T^-U__ - 



In fact, the latter is clearly also the solution to the general mean-variance problem if /it = 0, as it is often 
assumed over short investment periods. Other special case of particular interest due to its implications 
in asset pricing theory is that of the tangency portfolio (TP), which is given by 

In practice, /i^ and Xlj are unknown and so they must be estimated from market data observations. Let 
(l^ and denote the forecasted values of the expected mean and the covariance matrix, respectively. 
Moreover, let wgmvp and wjp represent the sample construction of (01) and ([S]), respectively, based on 
the previous moment estimates. In the following, we briefly elaborate on the classical forecasting settings 
that are customarily applied to estimate the input parameters of the Markowitz portfolio optimization 
framework. Specifically, we consider in the first place the conventional assumption according to which 
the returns over consecutive investment periods are independent and identically distributed, and the 
two required moments are obtained by their respective unconditional estimators. Then, we turn our 
attention to conditional forecasting models based on linear and stationary stochastic processes; finally, 
we shortly comment on heteroscedastic models allowing for some time-variability of the multivariate 
volatility process. 

Before proceeding further, let us introduce some useful notation. We will denote by {Ft-i} the 
information set of events up to the discrete-time instant t — 1, i.e., the fi-field generated by the observed 
series {yi}i^i- Conditional on the observation available up to the investment decision time, the covariance 
matrix of the stochastic process is given by definition by 5]^ = var (yt| J"t_i) = var (et| -^t-i)- 
Additionally, we let Y^v = [yt-N , ■ ■ ■ ,yt-i\ denote the sample data matrix with the N past return 
observations. 

A. The case of IID returns: weighted sampling and shrinkage estimation 

Under the classical assumption of i.i.d. returns, mean vector and covariance matrix are both modeled 
as constant over the entire estimation interval, i.e., fii = fi, Hi = H, I = t — N, . . . ,t — 1. Hence, the 
standard forecasts of the moments ai^e given in terms of a rolling-window by the (unconditional) sample 
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mean and sample covariance matrix, i.e., respectively, 

t-i 



1 \— > 1 



n=t-N 

and 

i-l 

iV- 



(7) 



^ = ^ E (y™ - A) (yn - A)^ = ^yat [i^v - ^wl) Y 

n=t-N ^ ^ 

A classical extension of the standard estimators in ^ and ([7]) considers the effect of weighting the 
sample observations. Let W^^^r G M^^^ and Ws,Ar G M^^^ be two diagonal matrices with entries 
given by a set of nonnegative coefficients, respectively, w^^n and ws.n, n = 1, . . . , N. Specifically, the 
weighted sample mean and weighted sample covariance matrix are respectively defined as 

1 1 
= XI ^M.^y^ = -^YatW^^atIat. (8) 

n=t-N 



and 



1 r 

n=t-N 



= ^Yn (in - ^W^^nInII^ Ws,^ (in - ^1n11v^^,n^ Y%. (9) 

Weighted estimators are usually applied in order to reduce variability and improve the stability of 
parameter estimators, for instance by using stratified random sampling ll33l . A related structure is the 
one obtained by the nonparametric bootstrap, for which the weights represent the number of times the 
corresponding observation appears in the bootstrap sample ||34ll^ 



In the context of asset allocation, [37 



(see also ifTTl ) suggests averaging a sequence of portfolios obtained by resampling with replacement from 
the originally available sample. Regarded as bootstrap aggregating of bagging, such averages are used in 
statistics for variance reduction purposes as well as to stabilize the prediction out-of-sample performance 
as a remedy to overfitting (see, e.g.. Chapter 10 in [38D - In particular, the bootstrap is typically used 
to provide small-sample corrections for possibly consistent but biased estimators. However, in high- 
dimensional settings, the standard application of the bootstrap generally yields inconsistent estimates of 
bias. An asymptotic refinement of the conventional bootstrap-based bias correction (see, e.g., [39l for 
standard methodology) is provided in lITSl by resorting to random matrix theoretical results. 

^We assume that the choice of weights is given; possible weighting schemes range from the standard simple random sampling 
with replacement (i.e., uniform resampling following a multinomial distribution) to sampling from the empirical distribution of 
the asset returns with nonuniform weights by for instance assigning different resampling probabilities to the different observations 
using importance sampling (see, e.g., 1351 . 1361 for more details) 
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A common further extension to (possibly weighted) sample estimation relies on the widespread family 
of Steinian (James-Stein-type) shrinkage estimators of the mean and covariance matrix of the observed 
samples. By means of regularizing or shrinking the estimators ^ and Q, we define: 

ASHR = (1 - '5) Aw + -5/^0. (10) 

and 

SSHR = (1 - P) + pSo, (11) 

where the nonrandom vector /1q and the positive matrix "Sq are the shrinkage targets or, from a Bayesian 
perspective, the prior knowledge about the unknown /j, and Xl, respectively, where 5 are p are the shrinkage 
intensity parameters. Clearly, if the shrinkage intensity parameters are equal to 1 and tv = = 
In, then the standard sample estimators are recovered. A typical example of shrinkage target for the 
covariance estimation is Sq = Im- Shinkage estimators in the context of portfolio optimization were 
first proposed in 1401 (see also |10| for the covariance matrix, and BTI for a study of the combination 
of resampling and shrinkage). 

As mentioned in the introduction, it has been recognized in the financial literature that, under severe 
estimation risk conditions, the estimated Markowitz's optimal portfolio rule and its various sophisticated 
extensions underperform out-of-the-sample the naive rule based on the equally weighted portfolio (EWP) 
choice. In an effort to incorporate this well-known fact into the portfolio selection process, some authors 
have considered optimizing a combinatioij^ of one or more sample portfolios, such as wgmvp and wjp, 
and the uniformly weighted asset allocation given by vi^ewp = ^m/M (see 131, |[T3l . lfT6l ). 

B. Accounting for serial dependence: conditional models 

The previous unconditional estimators of the moments of the asset returns are particularly well-suited 
for situations of static nature. Under a more general setting challenging the i.i.d. assumption, although a 
period-by-period computation of the sample statistics by means of a rolling window can indeed allow for 
some return predictability, the dynamic behavior of the input parameters is best modeled in practice by 
taking into account conditional information. For the sake of a more precise motivation, we first recall some 
empirically observed properties or attributes of time series of asset returns, the so-called stylized facts in 

*The rational behind this approach lies on the so-called fund-separation theorems in finance (see 1421 '). 
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the theory and practice of finance (see [43], and also B4l for a textbook exposition]^ Concerning their 
distributional properties, it has been observed that return series are leptokurtic or heavy-tailed (except for 
long time intervals, for which the log-normal assumption seems reasonable, at least for well-diversified 
portfolios), and extreme return values usually appear in clusters. Regarding their dynamics, conditional 
expected returns are usually negligible (at least relative to volatility values), and, more importantly, are 
not independent though exhibit little serial correlation. Conversely, squared returns, which are often used 
as a proxy of the unobserved covariance, show profound evidence of positive serial correlation with high 
persistence. 

If we set aside the time variability of conditional covariances (i.e., particularly for long-term horizons), 
the dynamic dependence structure of the asset returns can be captured irrespectively of whether its origin 
is momentum, mean-reversion, or lead-lag relations by conditionally modeling the mean via a vector 
auto-regressive moving-average process (VARMA) with both orders equal one: 

yt = fl + ^MYt-i + St- UM£t-j, (12) 

where and Um are square fixed parameter matrices, and St = Xl^/^x^. The process is customarily 
assumed to be weakly (second-order or covariance) stationary and ergodic, as well as stable and invertible 
(see [45 1 for detailed characterization of multivariate time series models). VARMA processes of higher 
orders than the VARMA(1, 1) in ([T2l ) are reported in the literature to be of less practical interest [46|, 
and even further restrictions leading to first-order vector autoregressions (i.e., &m = 0) are most often 
considered (see ll47l . and the more recent account in li48l ). In the large dimensional portfolio setting, 
parsimony is crucial to maintain the efficiency and low complexity of the model estimation process, 
and so different simplifications based on structural restrictions are usually considered in practice. In 
particular, in the case of processes with scalar parameter matrices $m = 0Im and 11^/ = vrljv/, then 
the population covariance matrix has a particular sparse structure, separable into cross-sectional S and 
temporal covariance components, which we will denote by Ct. Under the Gaussian assumption, the 
sample covariance matrix of members of this class of VARMA processes are doubly-correlated Wishart 
matrices. This matrix ensemble has been recently analyzed in the statistical physics literature in the 

'Although these facts approximately hold unchanged for different time intervals, some characteristics might arguably vary 
depending on the sampling frequency. According to the time elapsed between return observations, one might differentiate among 
long-term returns (e.g., weekly, monthly or yearly returns) and short-term returns (i.e., daily returns) - we have omitted purposely 
a further category including high-frequency data (i.e., intraday, tick data), as it requires different statistical methodologies which 
we will not consider in this work. 
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context of financial applications ||49l . 

According to the VARMA model, the conditional covariance remains constant regardless of the data. 
Especially for short-term horizons (e.g., daily returns), the observed features of the volatility process 
are best accounted for by conditional heteroscedastic models, such as the class of specifications for the 
multivariate extension of generalized autoregressive conditionally heteroscedastic (GARCH) process |50|, 

and the exponential weighted moving average (EWMA) scheme: 

t-i 

= Xyt^iyJ-i + (1 - A) ^t-i = A J] (1 - A)"-^ y,y^, (13) 

n=t-N 

where A is a smoothing prescribed parameter that characterizes the decay of the exponential memorj^. 
Firstly proposed in lISTI . the EWMA model has been found very useful in estimating the market risk of 
portfolios, as well as in portfoUo optimization ||52]| . 

C. Evaluating the performance of sample portfolios 

The quality of a portfolio rule w constructed based on in-sample forecasts of /i^ and can be measured 



by its achieved out-of-sample (realized) mean return ^ip (w) = w^fi^ and risk ap (w) = \/ w^S^w. In 
the study and practice of finance, measures of risk-adjusted achieved return are usually employed, being 
the Sharpe ratio a prominent one in portfolio management: 

SR(w) = ^. (14) 
ap [w) 

In particular, notice that the tangency portfolio defined in ^ is the portfolio that maximizes ([141 ) under 
the budget constraint. 

As discussed in the introduction, for a small sample-size and relatively large universe of assets, the 
out-of-sample performance of standard portfolio constructions can be expected to considerably differ from 
the theoretical performance given by the true moments. In this paper, we extend existing analyses of the 
statistical properties of portfolio rules based on the standard sample mean and sample covariance matrix 
estimators, and characterize the performance deviations due to estimation risk in terms of nonrandom 
model and scenario parameters. We will concentrate on the case of the unconditional moment estimators 
([Tol l and (ITTI) and conditional VARMA models with separable covariance structure. Specifically, we 
derive asymptotic deterministic equivalents of the out-of-sample performance of improved portfolio 

^Other possible and common choices of the weights for the past returns (adding up to 1) are equal weights (i.e. a rectangular 
window with equal weights), exponential weights (i.e. equivalent to an exponential moving average), weights following a power- 
law decay, or long memory weights (decaying logarithmically slowly). 
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implementations that are based on the previous estimators. We remark here that usual choices among 
practitioners of conditional heteroscedastic models, such as the EWMA model in ([T3] ). can also be fitted 
into our asymptotic framework by resorting to random matrix theoretical results dealing with general 
variance profiles (see |53|). Furthermore, we provide a mechanism to calibrate the set of weights and 
shrinkage parameters defining the improved portfolio constructions so as to optimize the achieved out- 
of-sample performance. 

III. Main results: Out-of-s ample analysis and asymptotic corrections 

In this section, we provide the main two results of the paper on the asymptotic characterization of the 
performance of sample portfolios and the proposed family of generalized consistent portfolio estimators 
are stated in Section IIII-AI and Section IIII-BI respectively. We first summarize the technical hypotheses 
supporting our research and introduce some new definitions: 

(Asl) Let Rm G ^MxM ^ M^^^ be two deterministic nonnegative matrices having spectral 

norm bounded uniformly in M and N, i.e., ||Ra/|Isup = sup^/>]^ || < +00 and ||TAr||g^p = 
supjv>i II Tat II < +00, respectively; the matrix Tjv is diagonal with entries denoted by t„, 
1 < n < iV. 

(As2) Let Xm be an M x matrix whose elements Xij, I < i < M, I < j < N, are i.i.d. 
standardized Gaussian random variables. 

(As3) We will consider the limiting regime defined by both dimensions M and growing large 
without bound at the same rate, i.e., A^, M — )• 00 such that < liminf cm < limsupcA/ < 00, 
with Cm = M/N. Quantities that, under the previous double-limit regime, are asymptotically 
equivalent to a given a random variable, both depending on M and A^, will be referred to 
as asymptotic deterministic equivalents, if only depend upon nonrandom model variables, and 
generalized consistent estimators, if they depend on observable random variables (e.g., sample 
data matrix). 

Before proceeding with the out-of-sample performance characterization, we identify next the key 
quantities of study into which the Sharpe ratio performance measure in (fT4l) can be decomposed. Let 
us first consider the unconditional model, where are considered to be constant over the es- 

timation window. Then, notice that the data observation matrix can be written as Yat = /xl^ + 
S^/^Xat, where Xjv = [xt-N, ■ ■ ■ ,xt-i]- For the sake of clarity of presentation, we will assume that 
the standard sample mean in ^ instead of its weighted version is applied in the definition of Svv- 
Moreover, we also assume that the entries of tv are chosen to be the eigenvalues of the matrix 
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(In — jjljsfi'pf) Wx;,Af (lAf — jf^N^Jf), which will be denoted by Tat. In particular, observe that, in 
this case, 

where we have used the fact that 

(/xl?; + S^/^x^) 1^1^ _ ^1^1?;) = 5]i/2x,v (in - ^1^1?^) ' (15) 

along with the invariance of the multivariate Gaussian distribution to orthogonal transformations. No- 
tice that, under the Gaussian assumption, the matrix in [15] is matrix-variate normal distributed, i.e., 
S;i/2xAr (In - jflNl-N) ~ MMmxN (Ojv/xTV, ^, ^n), or equivalently, vec (S^/^Xat (I^v - ^IatI^)) 
A/jv/Af (Oa/at, S (g) Iat); see lHH Section 3.3.2]. Consequently, is a central quadratic forms (central 
Wishart distributed if Tat = Iat). 

Now, let Rm = Sq ^^^SISIq and also, with some abuse of notation, R^^^ = Sq ^^^I]^/^, and 
consider further the nonnegative scalars om = p/ — p) and /3j\/ = 6/ {I — 5). Moreover, we define 
Yn = R],,(^XArT}/^, along with tiM = j^YNY^+aMliM, and vm = j^YnVn, where vn = VN'On, 
with V]\f being an N dimensional nonrandom vector with unit norm. Then, it is straightforward to see 
that the numerator and denominator of ([T4l ) can be written for the class of sample implementations of 
the optimal portfolio in ([3]l based on the unconditional estimators ( fTOl ) and (ITTI ) in terms of the following 
random variables: 

= vltll-VM, (16) 
Ig) = vltll-VM, (17) 
= V^tl^VM, (18) 

4? = vltllKM^l^VM, (19) 

4y = vljtllKMtllvM, (20) 

= ^I^S^/Ra/I^a/^m. (21) 

Notice that similar reasoning applies to the conditional model for the asset return in ([T2] ). since the sample 
covariance matrix of the process is a doubly-correlated Wishart matrix (cf. Section III-BI ). and the condi- 
tional mean estimator can be written from the VAR(l) model specification as fi^ = p, + I]^/^XArlA,Ar, 
where /i = J2n=i^^~^ P- ^^'^ '^A,N = ^n^n, with Aat G M^^^ being a diagonal matrix such that 

]^ /2 / 1/2 — 1/2 — 1/2 1 

Sq Iaz/vM, Sq /^,Sq Mo)^o /^(' 

and vn in |lAf/\/A', Ia^a^}- 
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By way of example, consider the estimation of the quantities {A,B,C} defining the optimal mean- 
variance portfolio in based on the unconditional estimators (ITOl) and (fTTI) with 6 = 0. Let us denote the 
estimators by |a, B, cj. In particular, observe that {1 — p) A = I^Sq ^5]q ^''^Sw^o + oa/Im^ 
and so we readily have {I — p) A = MS^l^ , with vm = / VM. Moreover, note that (W^ tvIa^ = 

xj^^lAT, where In = T^^^^W^^atIat) (;^l^W^,ArlAr = 1 -in definition above-) 

{l-p)B = (s-^/'SwSo + o^mImY^ ^o'^^P-w 



and therefore we have {1 — p) B = y/M^^^ + -y/M^^'', where the vector vm take values vm = 

]^ /2 / 1/2 ~ 

Xlg 1m/vM and i;m = 5]g /x, and V]\j = 1^. 

Furthermore, for the estimator of C, we have that (/iw = {f^'^Jf + ^^^^^A^) atIat = ;^/^1a^W^ jvIa 
^sV2x^W^,;vl7V = A* + ^sV2x;vT;v/2i7v) (||m||) 

, O 1 , ,Tv-l/2 /'v'-V2v^ -^-1/2,^ J \-l 1/2„i/2y rrl/27 
+ 2-/2 X,g (^i,g 2jW^o +aAfiAfj ^0 -^TvT^ liV 

— ?Af "T ?Af "T ?Af ■ 

Finally, notice that, additionally, the term is required to model the variance of the GMVP, and so is for 
modeling the return of the TP, but both terms can be straightforwardly represented similarly as |yl, i3, C*!- 

(1 - pf li',SsH'R5]EsHRlM = ^4^^ 



(1 — p)^ /i^Slg^pSSsHR/iyv — (,[J + i[J + i[J. 

From above, the previous two assumptions on the weighting matrices clearly facilitate exposition and 
tractability. However, we remark that more general cases can be equivalently reduced to the above key 
quantities by algebraic manipulations essentially relying on the matrix inversion lemma (cf. identity (l27l) 
in Appendix I A- Al l. 

Now that the out-of-sample performance characterization problem has been reduced to the study of the 
behavior of the quantities (fT6l) to (|2T] ). we proceed in the following two sections with their asymptotic 
analysis and consistent estimation. 
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A. Asymptotic performance analysis: a RMT approach 



Define 7 = 7j\/ 



F [^li] and 7 
1 



IM 



N 



tr 



■p2 



, where Em = R-a/ ( (^a/R-a/ + al 



and 



Etv = T^v (Jn + (^AfTAf) ^, with being the unique positive solution to the following system 

of equations f55^, Proposition 1]: 



hi = jjtr 



^ 1 

Okf = -M tr 



Tat (Iat + 5mTn) 
R-Af (<^a/Ra/ + oIa/ 



(22) 



AT 

Then, we have the following result characterizing the asymptotic behavior of the random variables (fT6l) 
to (EB. 

Theorem 1: (Asymptotic Deterministic Equivalents) Under Assumptions (Asl) to (As3), the following 
asymptotic equivalences hold true: 



?A// 

P(2) 
?A/ 

P(3) 
?Af 

?Af 

P(5) 
?A/ 

P(6) 
'?A/ 



0, 

(^Af'»^?^Tiv (5a/ Tat + I 



N 



vn, 



1 



1 - IMIM 

0, 



IJ^Ra/ I f^AfRAf + aM^M 



-2 



Ra/ ■'^Af, 



7Af 



1 - IMIM 

Proof: See Appendix iBl ■ 
Using Theorem [B estimates of the out-of-sample performance of optimal sample mean- variance port- 
folios based on the unconditional and conditional models in Section JI] are readily obtained. By means 
of the previous asymptotic approximations in a practically more meaningful and relevant double-limit 
regime (cf. Section ITVl ). more accurate information about the underestimation and overestimation effects 
of the portfolio risk and return, respectively, can be provided. 

The previous result is of interest on its own for characterization purposes as well as for scenario 
analysis in investment management. However, particularly for the calibration of unconditional models, 
one might well also be interested in estimates of the previous quantities that are given in terms of the 
available information, i.e., essentially, the data observation matrix. In the proposed asymptotic regime, it 
follows from Theorem [T] that both and are negUgible and therefore can be discarded for analysis 
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and decision purposes. While and are already given in terms of only observable dat£^, terms 
and happen to be defined in terms of the unknown Rm- We next present a class of estimators of 



(fT9l) and (|2TI) . or equivalently their asymptotic deterministic equivalents provided by Theorem [U which 
are strongly consistent under the limiting regime in (As3). 

B. Consistent estimation of optimal large dimensional portfolios 

The parameters defining the estimators in (fTOl ) and (fTTI ). i.e., {W^ TVjWs^Af} and {5,p}, effectively 
represent a set of degrees-of-freedom with respect to which the out-of-sample performance of a portfolio 
construction can be improved. For the calibration of unconditional models by means of optimizing the 
estimator parameterization, only the available sample data can be used in practice in order to select the 
previous set of parameters. To that effect, from the definition of the quantities ( fT9l ) and (|2T]) and the 
discussion above, the estimation of and (j'^j are required. The naive approach is based on the plug-in 
or conventional estimator of henceforth denoted with the subscript "cnv" by ^^^^^ j^^, which is given 
by replacing the unknown theoretical co variance matrix by the SCM, i.e.. 

Additionally, let ^^^^^ denote the "plug-in" estimator of ^^^^ , and notice that 

€nl,Af = -N'^'n^'n (^YatY^ + OAflA/) ^YatY^ (^^YatY^ + OmIa/) ^N^n 

= vl (^^Y?;Yjv)' {^^yIy n + aMlNY^ vn. (24) 

Before presenting the main result of this section, we provide an intermediate result that will be required 
for the statement of the improved estimators. 

Proposition 1: Under Assumptions (Asl) to (As3), a generalized consistent estimator of 6m, denoted 
by 6m, is given by the unique positive solution to the following equation: 
1 



Proof: The proof follows from the convergence result (l40l ) in Proposition |2l for Qm = ;^Ia/ and 

z = -i. m 

'This is not the case for vm ~ S,y^^^/j,, but still the consistent estimation of can be handled straightforwarly by 
rearranging terms. 
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The following theorem provides estimators of and which are consistent in the double-limit 
regime in Assumption (As3). 

Theorem 2: {Generalized Consistent Estimators) Under Assumptions (Asl) to (As3), we have the 
following consistent estimators for and : 

^lce,M = OM^lnljU + ^M, (26) 

where ^^^^^^ and S,^^^^ are defined as in (1231 ) and (l24l ). respectively, and 



iv I Iat + (5a/ Tat 



— tr 

AT 



N 



with (5m being given by Proposition [T] 

Proof: See Appendix ICl ■ 
Remark 1: The asymptotic equivalents and consistent estimators of and in Theorem and 
Theorem, respectively, generalize previous results on the characterization of quadratic forms depending 
on the eigenvalues and eigenvectors of the sample covariance matrix (see ||56l Proposition 1],|I571 Chapter 
4] and EH Theorem 1]). 

Remark 2: We notice that Theorem [T] Proposition [T] and Theorem |2] hold verbatim if the vectors vm, 
vn and the matrices Xat, Ra/> "^m, Tat have complex-valued entries. 

IV. Numerical validations 

In this section, we provide the results of some simulations illustrating the power of the proposed 
analytical framework. In particular, we consider the construction of a GMVP based on synthetic data 
modeling a universe of M = 50 assets (e.g.. Euro Stoxx 50) with annualized volatility (standard deviation) 
between 20% and 30%. For simple illustration purposes, we have assumed that the expected return is 
negligible compared to the asset covariance matrix, and so it has not been estimated. We run simulations 
considering estimation windows ranging from 20 to 200 return observations. Specifically, we measure 
the accuracy of approximating the out-of-sample (reaUzed) variance of a GMVP by its asymptotic 
deterministic equivalent (ADE) given in terms of the investment scenario parameters (cf. Section IIII-AI) . 
the conventional (CNV) implementation based on the naive replacement of the unknown parameters by 
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-□- ' ADE 
■ O " GCE 
- 4 -CNV 



30 



• 20 



10 - 







-B ^ 



20 40 



60 80 100 120 140 

Number of observations (N) 



180 200 



Fig. 1. Approximation of realized out-of-sample variance of a GMVP for fixed calibrating parameters 



their sample counterparts, and its generalized consistent estimator (GCE) derived in Section ITlI-B I Monte 
Carlo simulations (10'^ iterations) are run for three different scenarios, for which the approximation error 
in relative terms and in percentage is provided, i.e., 100 x \ap (wgmvp) — <5"p (wgmvp)I /crp (wgmvp), 
where ap (wgmvp) denotes here any of the three approximations. Moreover, in all cases we have 
considered a covariance matrix shrinkage estimator with = lAf> and parameters p and Ws,Ar to 
be calibrated for optimal performance. In the first experiment, we consider fixed values of the calibrating 
parameters given by the coefficient p = 0.05 and a diagonal matrix T = Ws a? given by half of its entries 
being equal to t = 0.75 and the other half equal to 2 — t. Figure [T] shows the relative approximation 
error for each method. In the two other experiments, we consider the construction of GMVPs given 
by the calibration of the optimal (for minimum variance) parameters p or Ws,Ar, respectively, where 
in each case the other parameter has been fixed to its value in the first experiment. Figures [2] and [3] 
show the results for the calibration of p and Ws.at, respectively. In our simulations, we applied a 
naive optimization scheme to find the optimal parameters in these simple illustrative examples, as we do 
not pursue dealing with practical optimization issues in this work, but rather focus on a representative 
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o, 
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Number of observations (N) 



Fig. 2. Approximation of realized out-of-sample variance of a GMVP for fixed p and optimized Ws^jv 



validation of the statistical results that we have derived; efficient optimization algorithms based, e.g., 
on successive convex approximation (see ||59l ). are left as future work and are now under investigation 
by the authors. From the simulation outputs, it is clear that the performance of the proposed consistent 
estimators is decreased whenever calibration of the parameters has to be performed, essentially due to 
the variability (fluctuations) of the estimators. An extensive simulation campaign is outside the scope of 
the section and the paper, but a reduction of this effect can be observed as expected by increasing for 
instance the number of assets in the universe (e.g., in the same illustrative line, M = 300 for the index 
Euro Stoxx 300). The use of information about the fluctuations of the estimators in order to improve the 
performance of the method is currently under investigation. 

V. Conclusions 

In this paper, we have provided a asymptotic framework for the analysis of the consistency of arbitrarily 
large sample mean-variance portfolios that are constructed on the basis of improved Bayesian or shrinkage 
estimation and weighted sampling. To that effect, we have resorted to recent contributions on the theory 
of the spectral analysis of large random matrices, based on a double-limit regime that is defined by both 
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Fig. 3. Approximation of realized out-of-sample variance of a GMVP for fixed Ws,jv and optimized p 

the number of samples and the number of portfoUo constituents going to infinity at the same rate. In spite 
of its asymptotic nature, by keeping both the return observation size and dimension to be of the same 
order of magnitude our results have proved to successfully describe the performance of sample portfolios 
under realistic, finite-size situations of interest. Furthermore, based on the previous characterization of 
the estimation risk, corrections of the level of risk underestimation and return overestimation of a specific 
portfolio constructions have been proposed so as to optimize the out-of-sample performance. Our proposed 
calibration rules represent a sensible portfolio choice improving on standard, usually overly optimistic 
Sharpe-based investment decisions. 

Appendix A 
Technical preliminaries 

A. Further definitions and auxiliary relations 

We first recall the Sherman-Morrison-Woodbury formula, or matrix inversion lemma, which will be 
used repeatedly in the sequel, i.e., 

(UHV + A)"^ = A-i - A-^U (H-i + VA-^U) VA^^ (27) 
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In particular, the next identity for rank-augmenting matrices follows from (1271) : 

A-iuv^A-i 



(A + uv^) ^ = A 



-1 



1 + v^A-iu' 



(28) 



Let Q = (^^YnY% - zImJ , and also Q(„) = (^^Y(„)Y^j - zImJ , where Y(„) G 
defined by extracting the nth column from the data matrix Y. In particular, using (|28l ). we get 

Using the definitions in Section IIII-AI we observe that 



A/xJV-l 



IS 



R-Af ( Sm'^M + OAfl 



Sm - ^MlM = "Af-^ tr 

1 r _2 

Additionally, the following definitions will be useful for our derivations: 



M 



(29) 

(30) 
(31) 



Cm 
Cm 



1 



1 



■tr 



1 - IMIM N 
IM 1 
1 - IMIM N 



R-A/ Sm'^M + "Afl 



Af 



tr 



R-Af ( (^AfR-Af + "Afl 



Af 



In particular, notice that 

Cm = — 7mCm- 
Lemma 1: The following relations hold true: 



(5m + omCm 
Sm — cumCm 



1 



1 



1 - 7M7M N 
7Af 1 



tr 



Tat (Iat + 5mTn) 



tr 



Tat (Iat + 5mT 



N 



\-2 



(32) 

(33) 
(34) 



1 - 7Af7A/ N 

Proof: We first show that 6m - oa/Ca/ = 7a/ (^hi + "a/Ca/) > and then prove that Sm + "mCa/ = 
(1 — 7Af7A/)~^ (Sm — SmJa-i^, so that the result follows finally by using (|3T]) . Let us handle the first 
equaUty. Using the definitions above, by simple partial fraction decomposition, we get 



Sm - 0!mCm = 7m (^hi - (XmImCm^ > 



and the first equality follows by using (|32]) . Regarding the second equality, using the definition of Cm 
along with ( [30l ) we notice that 

1 



"mC 



M 



Sm7m - Sm , 



1 - IMIM 

and the equality follows by introducing the previous expression in Sm + o-mCm and finally rearranging 
terms. ■ 
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B. Some useful stochastic convergence results 



The following two results will be useful to prove the vanishing characteristic of both (j'^^ and ^^^^ . 

Lemma 2: (Burkholder's inequality) Let {Fi} be a given filtration and {Xi] a martingale difference 
sequence with respect to {Fi}. Then, for any p G (l,oo), there exist constants Ki and K2 depending 
only on p such that II6OI Theorem 9] 



p/2- 



< E 



L 

E 

1=1 



Xi 



p/2- 



The result above as well as the next were originally proved for real variables. Extensions to the complex 
case are straightforward. The following result can be shown by using the martingale convergence theorem 
|[6T1 . We provide a sketch of the proof, which essentially follows the exposition in f6T, Theorem 20.10] 
(see also f63^, Corollary 3] and references therein). 

Theorem 3: Let {Ti} be a given filtration and {Xi} a square-integrable martingale difference sequence 
with respect to {Ti}. If 



sup 

L> 



^ 1=1 



then 



1 



z-i 



0, 



< 00, 



almost surely, as L — > cxd. 

Proof: Define Tl = L^^l"^ '^'^^ is a square-integrable martingale with respect to 

In particular, we have 

supE[|rL|] < supE^/2 r|2.^|2l ^ 

L>\ L>1 L -I 

the last inequality following from Burkholder's inequality in Lemma |2] Then, by the martingale conver- 
gence theorem we have that Tl converges almost surely as L — )• 00 to an integrable random variable, 
and the result follows by Kronecker's lemma (see, e.g., [I6II pag. 31]). ■ 



In the sequel, the matrix 0j\/ € 



bMxM 



will denote an arbitrary nonrandom matrix having trace norm 



bounded uniformly in M. Notice that ||0jvf < ll®A/|ltr' ^^'^ Frobenius norm of 0^/ is also 



uniformly bounded. For instance, if Zm S 



bMxM 



is an arbitrary nonrandom matrix with uniformly 



bounded spectral (in M), then in the cases &m = jj'^M and 0m = i^mi^a/, we have ||jgZA,/||^ = 
^ (^tr [ZmZ^] Y^^ = O {M-^l^) and W^m^IiW,, = \\vm\? = O (1), respectively. The following 
theorem will be instrumental in the proof of our results. The theorem is originally stated in a more general 
form for complex- valued matrices but applies verbatim for matrices with real- valued entries 1641 . 
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Theorem 4: Under Assumptions (Asl) to (As3), for each 2: € C 

-1' 



tr 



tr 







M 







N 



1 

1 



YatY^ - zIm 



yIYn - zIn 



tr 



tr 



0M (XA/R. - zIm) 



-1 



0iv (Ijv + eA/T) ^ 



(35) 
(36) 



where {cm = CAf (^) ,a:A/ = xm {z)} is the unique solution in C+ of the system of equations: 



CAf = i/ tr 
XM = ;^ tr 



R(xAfT - 2;lAf) ^ 

Moreover, given a symmetric nonnegative definite matrix Am G C^^^*^, we also have, for each z G 

C-M+, 

-1" 



tr 



0Af ( Am + ^YjvY^ - ^Im 



tr 



0A/ (AAf + XM^M - zIm) ^ 



(37) 



In particular, notice that |(5Af,5Af| coincides with {cm = em (z) ,xm = xm (z)} evaluated at z = 
—aM (see 1551 Proposition 1]). Moreover, we remark that where C,m = g'm ^^'^ Cm — ^M' where 
^'m — ^'m i^) ^'m — ^'m (-^) the derivatives wrt. z of, respectively, cm and xm, namely given by 



Cm 



x'm 



^tr 

m 


Ra/ (xA/RAf - ^;Im) ^ 




^ m 


R|f (xatRa/ - ^;Im) ^ 


itr 


(Iw + eA/Tjv) ^ 



^tr 

Af " 


Km (xa/Ra/ - ^;Im) ^ 


^tr 


(Iat + eAfTAf) ^ 




J- Af 


Kli {xmKm - zIm) ^ 


^tr 

AT 


(liv + eA/Tiv) ^ 



(38) 



(39) 



Along with Theorem |4l the following proposition will also be a key element in proving Theorem [T] 
and Theorem [2] 

Proposition 2: Let the definitions and assumptions on the data model specified until now hold. Then, 
for each z eC-R+, 



tr 



tr 



1 



0A/ ^Y^tYI; ( ^YAfYir - Zl 



M—y-NT^N 
1 - 



1 - 



T 



N 



0jv-Y^Yjv I ^Y^^Ya^ - zl 



AT I AT — ZLM 



N 



jy X N — ZlAT 



Xm tr 



CM tr 



0AfR(xA/R - zImY 



@nT (In + cmT' 



-1 



(40) 



(41) 



where {cm = gm (z) ,xm = xm {z)} are defined as in Theorem H] Moreover, we also have, for each 
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z G 



tr 



0m^Ya,yI; ( -Y/vY 



tr 



1 



1 



N 
1 - 



AT I AT 



e^-Y^YA. ( ^Y^Y^v - 



zl 



M 



-2 



-2 



(XM - zx\,j) tr ©A/R-M (xAfR-M - zIm) ^ 

4/ tr 



@NTN{lN + eMTN) ^ 



, (42) 
(43) 



N \N 

where e\.j and x'j^j are given by (1381) and ( [39l ). respectively. 

Proof: The proof of (l40l ) and (|4T]) follow the same lines of reasoning. We show (|4T]) . First, notice 

that 



tr 



0^^Y^Y^(^Y^Y^-zI;v 



1 

N 



tr 



M 



Moreover, using the matrix inversion lemma in (1271) we get 

and, then, write 

tr @n^yIYn (^^yJjYn - zIn^ =tr[0iv] + ztr 
Now, Theorem m yields 

tr 

Then, from (l44l ). we finally have that 
tr 



1 



I.-^^Y-Y. 



0^ ( ^Y^Y^ - zIn 



(44) 



0iv I ^y;^Y^ - zIn 



tr 



0^(l7V + eMT) ^ 



0iV^Y^Y;v (^^N^N - zIn 



-1 



tr 



@N (ItV - (i-N + EAfT 



,-1 



CA/ tr 



0jvT(Iiv + eA./T)-^ 



Regarding the proof of (142] ) and (1431 ). we first notice that 

-2 



tr 



eA^^Y^Y?;flY^Y^-zlAf 



tr 



©^Y^Yat f — Y^Yat - zIn 



-2 





















1" 


5z 





eAf^Y^Y^f^Y^Y^-zlAf 



0Af ^Y^Y^ (^^nYn - zIm 



-1 



-1 



Moreover, the almost sure convergence stated in (l40l ) and (|4TI ) is uniform on C — R+, and therefore 
the convergence of the derivatives holds by the Weierstrass convergence theorem [65 ] (see alternatively 
argument in |[66l Lemma 2.3] based on Vitali's theorem about the uniform convergence of sequences of 
uniformly bounded holomorphic functions towards a holomorphic function ll67l . |[68l ). ■ 
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Appendix B 
Proof of Theorem [H 

Next, we separately proof the convergence of each term in the statement of the theorem. 

A. The terms and (j'^ 

In particular, the asymptotic deterministic equivalent of follows readily by Theorem ID Regarding 
, after observing first that 

the result is obtained by applying (|4T]) in Proposition [2] 

B. The term and 

We recall that l^^"* = v^jYI^^vm = ;^i^m^m ^^'^^^ ^^'^ write 

J^'^M^M^NVn = J^vljQ(n)^NVN - 9n-^^^MQ(n)yny^Q{n) YATt^iV (45) 

where we have defined g„ = (l + ;^y^Q(n)yn) ^- Moreover, recall also that ^fj = v'j^'E]^'RM'^^4'^^M 
■^vJ^'I]^j'R]\i't]^jY]\fVi\f, and consider first the following notations: 

Xj,n = ^y^Q(n)Zj(„)yri " [Qh^jh] > 

for j = 1, 2, with Zj(„) being arbitrary M x M dimensional matrices, possibly random but not depending 
on the x„, and such that supA^>i ||Zj(„)|| < +00; in particular, Z^j-^) = Im and Z2(n) = Q(,„)Rm- Then, 
observe that we can write ^^^^ and as, respectively, 

1 - 1 ^ 1 ^ 

n=l n=l 

1 ^ y 
+ ]V ^ Xl,ntngn [^'Afln C^Q(n) Zl(„)yn, (46) 

n=l 

where the first term on the RHS follows from jjvJjQ(^,n^Y]\rV]\r, and, similarly, 

1 ^ 1 ^ 1 ^ 

-vI^^^IRm^mYnVn = ]^ ^5n)^" " iV ^ ^^")''" " iV ^ ^5")^" 

n=l n=l n=l 



N N 
^ — ^ T ~ ^ — ^ T 

AT / ^ Xl,ntnQn i'^Nln 



n=l n=l 



1^1 1 1 

+ Z^inQn [^Nln -^^AfQ{n)Zl(„)yn-^yn Q(n)Z2(n)yn-^y„ Q{n)Zl(„)yn, 
n=l 



(47) 
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where the following definitions apply: 



Zl(„, 
Z2{n 
Z3{n 

Z5(n 



- 1/2 



in^n [■l'Af]„ — tr [Q(„)Zi(„)] R^^ Zi(„)Q(„)t>M, 

- 1/2 — 

["^^A^ln^M Z2(n)Q{n)'5M> 

in9n [^'Afln [Q(„)] R]^(^Z2(n)Q(n)^M, 



(n 



N 



We now prove that the terms of the form X]^=i ^fe(ra)-^" vanish almost surely. To see this, we 
further define the following two L = MN dimensional vectors, namely, x - 



T 
Xl 



X 



N 



, and 



-■k{N) 



(48) 



, /c = 1, 2, 3, 4, 5. Then, we notice that 

N , -, L 

where we have defined rjk^i = Z'^^Xi, with Z^^i and X; being the /th entries of and x, respectively. 
In particular, if Qi is the cr-field generated by the random variables {Xi}, then notice that forms a 



N L 

77 z^(„)X„ = — X = -y= 2^ 

n=l ^ 1=1 



martingale difference sequence with respect to the filtration {Gi}- Indeed, E Z]^j^Xi 



we notice that E 



\'nk,i\ 



Qi- 



E 



\Zk,i\ 



n-i 



0. Then, 



is bounded by assumption and so by Theorem |3] we 
have that (1481 ) vanishes almost surely. 

We now handle the last term on the RHS of equation (l46l) together with the last three terms on the 
RHS of equation (l47l ). From the developments in |[64l it follows that, for a sufficiently large p, 



E 



tnqn [^N]n "J^M Q (n) Z j (n) Y n 



|2p" 



< 



2p 



(49) 



|Im{z}|^P' 

where Kp is a constant depending on p but not on M, N which may take different values at each 
appearance, and 



E 



|2p 



< 



(50) 



NP\lm{z}\'^f'' 

Then, using (l49l) and (l50l) . and applying first Minkowski's and then the Cauchy-Schwarz inequalities, we 

get = 1,2) 



N 



Xi,ntnqn [^'Ar]n ■'^MQ(n) Zj(„)yr 



n=l 



1 



N 



~ IP 



1/p 



AT 



< 



— Ve 



l/2p 



,n=l 
|2p" 



]gl/2p 



\Xi 



|2p 



< 



^n=l 



X'Pl'^ |Im{z}|^P' 
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Furthermore, let Xn = j!Y.n=i'^nql[vN]nJ!'^li^{n)'^i{n)ynjjy^ and, 
by using Jensen's inequality along with (l49l) . observe that 



E 



A'' ^1 A'' 

■^rY^n < ^ y E [\Xn\''] < max E [| A^^l^] < -J— 

jV ^ " - N ^ LI nl J _ -^^^^^^ LI nl J _ ^^^^ 



Kr, \z\ 



|Im {z}| 



e > 0, 



n=l J n=l 

for sufficiently large p and well-chosen q and r, on p but not on M, N. Then, the almost sure convergence 
to zero of the four terms for each z G C"*" follows then by Borel-Cantelli's lemma. Finally, convergence 
of the real nonnegative axis follows by an argument based on Montel's normal family theorem (see, e.g.. 
Section 4 in |[64ll ). 



C. The term and 



Observe that 



-1/2 



d ( _ _i/2 



1/2 



dz 

Furthermore, using ( [37] ) in Theorem |4] we get 



r T —1/2 / —1 T 

— yUf^Hj^ (aAfRjvf + XivTivXTv — zIm 



I^jV/Rm (ctAfRA/ +XArTArX^ -zIm) R^f l^A/ ^A/Rm (^"AfRAf + (^^^y - I 

where, for each z outside the real positive axis, jej^-* (z) , is the unique solution to the system: 



2=0 



R 



A/^ ''^A//, 



4?W = ^tr 



(4) / \ -1 



OAfRA/ + (2:) - Ia4 



Ttv (Itv + (z) Tjv 



-1 



Then, using 

4^0) = - 
we finally get 



l-4?(0))^tr 



Rm ( ^^M (0) + OLM^M 





1 











T^(l,v + e£\0)T;v 



d_ 

dz 



-L'T/fR 



Af-^A/ 



-1/2 



aA/RA , / + ( a; 



^A/ 



Z I Ia/ 



R 



-1/2 



Af 



'L'A/ 



2=0 
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where 



l-4)'(0) = l + e2)'(0)ltr 



n(l^ + e£)(0)T 



N 



l+(l-xS)'(0))ltr 





1 


)1 


— tr 




N 



T^(ljv + ei?(0)T 



1 



N 



tr 



(ljv + eit)(0)T^ 



C(6) 



N 



tr 



R-i/ ( (0) + aM^M 



Let us now deal with Q/. We recall that 



(6) T —1 ^ —1 ^ T 1/2 T f ^ T — 



Let Ajvf = Am (f) = um'^jJ^ + ilAf> with t > being a real positive scalar, and observe that 



1 T -iV 



1/2 



d_ 

dt 



»,Trpl/2^T 



XivTivX^ + Am (i) ) XArTti^t^Ar 



t=0 



Furthermore, using the matrix inversion lemma in (|27] ). we write 

t]IV + Am (t)) ^NT]i^ = In- (In + T]iV (omRm + *Im) X^vT 



■N " 

and so we have 



1/2 



1^;?; (l,v + T^v^'x^ (aA/RM + *Ia/)"' XtvT;^/') -ujv} 



9f 



t=0 



Now, using Theorem |4] we get 



v% (J^T]iV {aM'R]^^ +tlM) 'XjvTjy/' + lTvj VN-vl(xf}{-l)TN + lN) 'v^, 
where ^xfj (—1) = (t) , e^^) (—1) = e^^j is the solution to the following system of equations: 



T^(xf^)(t)Tiv + lAf 



omRm + + 4? (*)) 



Finally, notice that 

di 



-1 



t=0 



-xtf (0) (x£) (0) + I^) 



-2 
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where xf) (0) = ;^ tr 



-S'(0) = (l + eif(0))ltr 



l-4?'(0)^tr 



, and 



(e^^ (0) Rjv^ + omIm 
T;v (xf) (0) T;v + Im 





Rm (4? (0) Rm + omIm) 






(4? (0) R.M + "mIm) 




T,v (3;^ (0) + Im) 



We first show (1251 ). i.e., 
1 



-2 



Appendix C 
Proof of Theorem [2] 



■vI^^YnYJ; (^^YatY^ + omIm ) l^M 



■»^mRm^ ( (^mRm + omIm ) Ra,''/''^M, 



1 - IMIM 



.1/2, 



"•M 



Using (l42l ) in Proposition |2] with 6m = vmu^j and z = — q;m> we have that (notice that xm — zx'j^j\^^_^^^ 
6 m + qmCm) 



hi + omCa/) ■i^mRm (^^m'R-m + omIm) i^m, 



and the proof follows by (l33l) in Lemma [T] 

Let us now handle ( |26l ). We want to prove that, in effect, 

2 



N 



Tat (Iat + (5m T 



-2 



^M ( ^^nY% 



^YnYI + omIm ] VM - SIjvIjT^ ( In + -5mT ) vm 



IM 



1 - IMIM 



v% {6mTn + In) vn- 



First, observe that 



V 



YJjYn + aM^N ] vn 



vj^—Yj^YN ( — Y^Yiv + aM^-N 1 un - aMj^v%YjjYN ( — Y^Yat + omIat ) t^Ar. 
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Moreover, asymptotic deterministic equivalents of the two terms on the RHS can be found by (|4TI) in 
Proposition |2] with Qn = i^nuJ/ and z = —um as 

and ^ 

^v%yIYn (J^Y%Yn + OA/Iiv^ VN X Cm-oIt {In + SmT)'^ vn- 

Then, by rearranging terms we can write 

vl (yJjYn) ^ {y%Yn + omIn) ^ VN ^ {Sm - umCm) v^T {In + SmT)'^ vn+SIjV^T^ {In + SmT)'^ vn, 
and the result follows finally after straightforward algebraic manipulations by (|34l ) in Lemma [T] 
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